智能论文笔记

Learning sparse features can lead to overfitting in neural networks

Leonardo Petrini , Francesco Cagnetta , Eric Vanden-Eijnden , Matthieu Wyart

分类： (统计)机器学习 | 机器学习

2022-06-24

人们普遍认为，深网的成功在于他们学习数据功能的有意义表示的能力。然而，了解该功能学习何时以及如何提高性能仍然是一个挑战：例如，它对经过对图像进行分类的现代体系结构有益，而对于在相同数据上针对同一任务培训的完全连接的网络是有害的。在这里，我们提出了有关此难题的解释，表明特征学习可以比懒惰训练（通过随机特征内核或NTK）更糟糕，因为前者可以导致较少的神经表示。尽管已知稀疏性对于学习各向异性数据是必不可少的，但是当目标函数沿输入空间的某些方向恒定或平滑时，这是有害的。我们在两个设置中说明了这种现象：（i）在D维单元球体上的高斯随机函数的回归，以及（ii）图像基准数据集的分类。对于（i），我们通过训练点数来计算概括误差的缩放率，并证明即使输入空间的尺寸很大，不学习特征的方法也可以更好地推广。对于（ii），我们从经验上表明，学习特征确实会导致稀疏，从而减少图像预测因子的平滑表示。这一事实是可能导致性能恶化的，这与沿差异性的平滑度相关。

translated by 谷歌翻译

Relative stability toward diffeomorphisms indicates performance in deep nets

Leonardo Petrini , Alessandro Favero , Mario Geiger , Matthieu Wyart

分类：机器学习 | 计算机视觉

2021-05-06

理解为什么深网络可以在大尺寸中对数据进行分类仍然是一个挑战。已经提出了它们通过变得稳定的差异术，但现有的经验测量值得支持它通常不是这种情况。我们通过定义弥散术的最大熵分布来重新审视这个问题，这允许研究给定规范的典型的扩散术。我们确认对基准数据集的稳定性与基准数据集的性能没有强烈关联。相比之下，我们发现，对于普通转换的稳定性，R_F $的稳定性与测试错误$ \ epsilon_t $相比。在初始化时，它是初始化的统一，但在最先进的架构培训期间减少了几十年。对于CiFar10和15名已知的架构，我们发现$ \ epsilon_t \约0.2 \ sqrt {r_f} $，表明获得小$ r_f $非常重要，无法实现良好的性能。我们研究R_F $如何取决于培训集的大小，并将其与简单的不变学习模型进行比较。

translated by 谷歌翻译

Online Convex Optimization of Programmable Quantum Computers to Simulate Time-Varying Quantum Channels

Hari Hara Suthan Chittoor , Osvaldo Simeone , Leonardo Banchi , Stefano Pirandola

分类：人工智能 | 机器学习

2022-12-09

Simulating quantum channels is a fundamental primitive in quantum computing, since quantum channels define general (trace-preserving) quantum operations. An arbitrary quantum channel cannot be exactly simulated using a finite-dimensional programmable quantum processor, making it important to develop optimal approximate simulation techniques. In this paper, we study the challenging setting in which the channel to be simulated varies adversarially with time. We propose the use of matrix exponentiated gradient descent (MEGD), an online convex optimization method, and analytically show that it achieves a sublinear regret in time. Through experiments, we validate the main results for time-varying dephasing channels using a programmable generalized teleportation processor.

translated by 谷歌翻译

Understanding electricity prices beyond the merit order principle using explainable AI

Julius Trebbien , Leonardo Rydin Gorjão , Aaron Praktiknjo , Benjamin Schäfer , Dirk Witthaut

分类：机器学习

2022-12-09

Electricity prices in liberalized markets are determined by the supply and demand for electric power, which are in turn driven by various external influences that vary strongly in time. In perfect competition, the merit order principle describes that dispatchable power plants enter the market in the order of their marginal costs to meet the residual load, i.e. the difference of load and renewable generation. Many market models implement this principle to predict electricity prices but typically require certain assumptions and simplifications. In this article, we present an explainable machine learning model for the prices on the German day-ahead market, which substantially outperforms a benchmark model based on the merit order principle. Our model is designed for the ex-post analysis of prices and thus builds on various external features. Using Shapley Additive exPlanation (SHAP) values, we can disentangle the role of the different features and quantify their importance from empiric data. Load, wind and solar generation are most important, as expected, but wind power appears to affect prices stronger than solar power does. Fuel prices also rank highly and show nontrivial dependencies, including strong interactions with other features revealed by a SHAP interaction analysis. Large generation ramps are correlated with high prices, again with strong feature interactions, due to the limited flexibility of nuclear and lignite plants. Our results further contribute to model development by providing quantitative insights directly from data.

translated by 谷歌翻译

Transformer-based normative modelling for anomaly detection of early schizophrenia

Pedro F Da Costa , Jessica Dafflon , Sergio Leonardo Mendes , João Ricardo Sato , M. Jorge Cardoso , Robert Leech , Emily JH Jones , Walter H. L. Pinaya

分类：机器学习 | 人工智能

2022-12-08

Despite the impact of psychiatric disorders on clinical health, early-stage diagnosis remains a challenge. Machine learning studies have shown that classifiers tend to be overly narrow in the diagnosis prediction task. The overlap between conditions leads to high heterogeneity among participants that is not adequately captured by classification models. To address this issue, normative approaches have surged as an alternative method. By using a generative model to learn the distribution of healthy brain data patterns, we can identify the presence of pathologies as deviations or outliers from the distribution learned by the model. In particular, deep generative models showed great results as normative models to identify neurological lesions in the brain. However, unlike most neurological lesions, psychiatric disorders present subtle changes widespread in several brain regions, making these alterations challenging to identify. In this work, we evaluate the performance of transformer-based normative models to detect subtle brain changes expressed in adolescents and young adults. We trained our model on 3D MRI scans of neurotypical individuals (N=1,765). Then, we obtained the likelihood of neurotypical controls and psychiatric patients with early-stage schizophrenia from an independent dataset (N=93) from the Human Connectome Project. Using the predicted likelihood of the scans as a proxy for a normative score, we obtained an AUROC of 0.82 when assessing the difference between controls and individuals with early-stage schizophrenia. Our approach surpassed recent normative methods based on brain age and Gaussian Process, showing the promising use of deep generative models to help in individualised analyses.

translated by 谷歌翻译

Deep Learning Architectures for FSCV, a Comparison

Thomas Twomey , Leonardo Barbosa , Terry Lohrenz , P. Read Montague

分类：机器学习

2022-12-05

We examined multiple deep neural network (DNN) architectures for suitability in predicting neurotransmitter concentrations from labeled in vitro fast scan cyclic voltammetry (FSCV) data collected on carbon fiber electrodes. Suitability is determined by the predictive performance in the "out-of-probe" case, the response to artificially induced electrical noise, and the ability to predict when the model will be errant for a given probe. This work extends prior comparisons of time series classification models by focusing on this specific task. It extends previous applications of machine learning to FSCV task by using a much larger data set and by incorporating recent advancements in deep neural networks. The InceptionTime architecture, a deep convolutional neural network, has the best absolute predictive performance of the models tested but was more susceptible to noise. A naive multilayer perceptron architecture had the second lowest prediction error and was less affected by the artificial noise, suggesting that convolutions may not be as important for this task as one might suspect.

translated by 谷歌翻译

DimenFix: A novel meta-dimensionality reduction method for feature preservation

Qiaodan Luo , Leonardo Christino , Fernando V Paulovich , Evangelos Milios

分类：机器学习

2022-11-30

Dimensionality reduction has become an important research topic as demand for interpreting high-dimensional datasets has been increasing rapidly in recent years. There have been many dimensionality reduction methods with good performance in preserving the overall relationship among data points when mapping them to a lower-dimensional space. However, these existing methods fail to incorporate the difference in importance among features. To address this problem, we propose a novel meta-method, DimenFix, which can be operated upon any base dimensionality reduction method that involves a gradient-descent-like process. By allowing users to define the importance of different features, which is considered in dimensionality reduction, DimenFix creates new possibilities to visualize and understand a given dataset. Meanwhile, DimenFix does not increase the time cost or reduce the quality of dimensionality reduction with respect to the base dimensionality reduction used.

translated by 谷歌翻译

A $k$-additive Choquet integral-based approach to approximate the SHAP values for local interpretability in machine learning

Guilherme Dean Pelegrina , Leonardo Tomazeli Duarte , Michel Grabisch

分类：机器学习

2022-11-03

Besides accuracy, recent studies on machine learning models have been addressing the question on how the obtained results can be interpreted. Indeed, while complex machine learning models are able to provide very good results in terms of accuracy even in challenging applications, it is difficult to interpret them. Aiming at providing some interpretability for such models, one of the most famous methods, called SHAP, borrows the Shapley value concept from game theory in order to locally explain the predicted outcome of an instance of interest. As the SHAP values calculation needs previous computations on all possible coalitions of attributes, its computational cost can be very high. Therefore, a SHAP-based method called Kernel SHAP adopts an efficient strategy that approximate such values with less computational effort. In this paper, we also address local interpretability in machine learning based on Shapley values. Firstly, we provide a straightforward formulation of a SHAP-based method for local interpretability by using the Choquet integral, which leads to both Shapley values and Shapley interaction indices. Moreover, we also adopt the concept of $k$-additive games from game theory, which contributes to reduce the computational effort when estimating the SHAP values. The obtained results attest that our proposal needs less computations on coalitions of attributes to approximate the SHAP values.

translated by 谷歌翻译

New Paradigms for Exploiting Parallel Experiments in Bayesian Optimization

Leonardo D. González , Victor M. Zavala

分类： (统计)机器学习 | 人工智能 | 机器学习

2022-10-03

Bayesian optimization (BO) is one of the most effective methods for closed-loop experimental design and black-box optimization. However, a key limitation of BO is that it is an inherently sequential algorithm (one experiment is proposed per round) and thus cannot directly exploit high-throughput (parallel) experiments. Diverse modifications to the BO framework have been proposed in the literature to enable exploitation of parallel experiments but such approaches are limited in the degree of parallelization that they can achieve and can lead to redundant experiments (thus wasting resources and potentially compromising performance). In this work, we present new parallel BO paradigms that exploit the structure of the system to partition the design space. Specifically, we propose an approach that partitions the design space by following the level sets of the performance function and an approach that exploits partially-separable structures of the performance function found. We conduct extensive numerical experiments using a reactor case study to benchmark the effectiveness of these approaches against a variety of state-of-the-art parallel algorithms reported in the literature. Our computational results show that our approaches significantly reduce the required search time and increase the probability of finding a global (rather than local) solution.

translated by 谷歌翻译

On the Generalization of Deep Reinforcement Learning Methods in the Problem of Local Navigation

Victor R. F. Miranda , Armando A. Neto , Gustavo M. Freitas , Leonardo A. Mozelli

分类：机器人 | 机器学习

2022-09-28

在本文中，我们研究了DRL算法在本地导航问题的应用，其中机器人仅配备有限量距离的外部感受传感器（例如LIDAR），在未知和混乱的工作区中朝着目标位置移动。基于DRL的碰撞避免政策具有一些优势，但是一旦他们学习合适的动作的能力仅限于传感器范围，它们就非常容易受到本地最小值的影响。由于大多数机器人在非结构化环境中执行任务，因此寻求能够避免本地最小值的广义本地导航政策，尤其是在未经训练的情况下，这是非常兴趣的。为此，我们提出了一种新颖的奖励功能，该功能结合了在训练阶段获得的地图信息，从而提高了代理商故意最佳行动方案的能力。另外，我们使用SAC算法来训练我们的ANN，这表明在最先进的文献中比其他人更有效。一组SIM到SIM和SIM到现实的实验表明，我们提出的奖励与SAC相结合的表现优于比较局部最小值和避免碰撞的方法。

translated by 谷歌翻译